Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Data consisting of a graph with a function mapping into$${\mathbb {R}}^d$$ arise in many data applications, encompassing structures such as Reeb graphs, geometric graphs, and knot embeddings. As such, the ability to compare and cluster such objects is required in a data analysis pipeline, leading to a need for distances between them. In this work, we study the interleaving distance on discretization of these objects, called mapper graphs when$$d=1$$ , where functor representations of the data can be compared by finding pairs of natural transformations between them. However, in many cases, computation of the interleaving distance is NP-hard. For this reason, we take inspiration from recent work by Robinson to find quality measures for families of maps that do not rise to the level of a natural transformation, called assignments. We then endow the functor images with the extra structure of a metric space and define a loss function which measures how far an assignment is from making the required diagrams of an interleaving commute. Finally we show that the computation of the loss function is polynomial with a given assignment. We believe this idea is both powerful and translatable, with the potential to provide approximations and bounds on interleavings in a broad array of contexts.more » « less
-
Let $$\R$$ be a real closed field and $$\C$$ the algebraic closure of $$\R$$. We give an algorithm for computing a semi-algebraic basis for the first homology group, $$\HH_1(S,\mathbb{F})$$, with coefficients in a field $$\FF$$, of any given semi-algebraic set $$S \subset \R^k$$ defined by a closed formula. The complexity of the algorithm is bounded singly exponentially. More precisely, if the given quantifier-free formula involves $$s$$ polynomials whose degrees are bounded by $$d$$, the complexity of the algorithm is bounded by $$(s d)^{k^{O(1)}}$$. This algorithm generalizes well known algorithms having singly exponential complexity for computing a semi-algebraic basis of the zero-th homology group of semi-algebraic sets, which is equivalent to the problem of computing a set of points meeting every semi-algebraically connected component of the given semi-algebraic set at a unique point. It is not known how to compute such a basis for the higher homology groups with singly exponential complexity. As an intermediate step in our algorithm we construct a semi-algebraic subset $$\Gamma$$ of the given semi-algebraic set $$S$$, such that $$\HH_q(S,\Gamma) = 0$$ for $q=0,1$. We relate this construction to a basic theorem in complex algebraic geometry stating that for any affine variety $$X$$ of dimension $$n$$, there exists Zariski closed subsets \[ Z^{(n-1)} \supset \cdots \supset Z^{(1)} \supset Z^{(0)} \] with $$\dim_\C Z^{(i)} \leq i$, and $$\HH_q(X,Z^{(i)}) = 0$$ for $$0 \leq q \leq i$$. We conjecture a quantitative version of this result in the semi-algebraic category, with $$X$$ and $$Z^{(i)}$$ replaced by closed semi-algebraic sets. We make initial progress on this conjecture by proving the existence of $$Z^{(0)}$$ and $$Z^{(1)}$$ with complexity bounded singly exponentially (previously, such an algorithm was known only for constructing $$Z^{(0)}$$).more » « less
-
Bollenbach, Tobias (Ed.)Leaves are often described in language that evokes a single shape. However, embedded in that descriptor is a multitude of latent shapes arising from evolutionary, developmental, environmental, and other effects. These confounded effects manifest at distinct developmental time points and evolve at different tempos. Here, revisiting datasets comprised of thousands of leaves of vining grapevine (Vitaceae) and maracuyá (Passifloraceae) species, we apply a technique from the mathematical field of topological data analysis to comparatively visualize the structure of heteroblastic and ontogenetic effects on leaf shape in each group. Consistent with a morphologically closer relationship, members of the grapevine dataset possess strong core heteroblasty and ontogenetic programs with little deviation between species. Remarkably, we found that most members of the maracuyá family also share core heteroblasty and ontogenetic programs despite dramatic species-to-species leaf shape differences. This conservation was not initially detected using traditional analyses such as principal component analysis or linear discriminant analysis. We also identify two morphotypes of maracuyá that deviate from the core structure, suggesting the evolution of new developmental properties in this phylogenetically distinct sub-group. Our findings illustrate how topological data analysis can be used to disentangle previously confounded developmental and evolutionary effects to visualize latent shapes and hidden relationships, even ones embedded in complex, high-dimensional datasets.more » « less
-
-
Abstract PremiseThe selection ofArabidopsisas a model organism played a pivotal role in advancing genomic science. The competing frameworks to select an agricultural‐ or ecological‐based model species were rejected, in favor of building knowledge in a species that would facilitate genome‐enabled research. MethodsHere, we examine the ability of models based onArabidopsisgene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested onArabidopsisdata achieved near perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained onArabidopsisdata, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. ResultsThe identity of belowground tissue can be predicted more accurately than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance fromArabidopsis.k‐nearest neighbors is the most successful algorithm, suggesting that gene expression signatures, rather than marker genes, are more valuable to create models for tissue and cell type prediction in plants. DiscussionOur data‐driven results highlight that the assertion that knowledge fromArabidopsisis translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis onArabidopsisand prioritize plant diversity.more » « lessFree, publicly-accessible full text available January 1, 2026
-
The field of plant science has grown dramatically in the past two decades, but global disparities and systemic inequalities persist. Here, we analyzed ~300,000 papers published over the past two decades to quantify disparities across nations, genders, and taxonomy in the plant science literature. Our analyses reveal striking geographical biases—affluent nations dominate the publishing landscape and vast areas of the globe have virtually no footprint in the literature. Authors in Northern America are cited nearly twice as many times as authors based in Sub-Saharan Africa and Latin America, despite publishing in journals with similar impact factors. Gender imbalances are similarly stark and show remarkably little improvement over time. Some of the most affluent nations have extremely male biased publication records, despite supposed improvements in gender equality. In addition, we find that most studies focus on economically important crop and model species, and a wealth of biodiversity is underrepresented in the literature. Taken together, our analyses reveal a problematic system of publication, with persistent imbalances that poorly capture the global wealth of scientific knowledge and biological diversity. We conclude by highlighting disparities that can be addressed immediately and offer suggestions for long-term solutions to improve equity in the plant sciences.more » « less
-
Drost, Hajk-Georg (Ed.)Since they emerged approximately 125 million years ago, flowering plants have evolved to dominate the terrestrial landscape and survive in the most inhospitable environments on earth. At their core, these adaptations have been shaped by changes in numerous, interconnected pathways and genes that collectively give rise to emergent biological phenomena. Linking gene expression to morphological outcomes remains a grand challenge in biology, and new approaches are needed to begin to address this gap. Here, we implemented topological data analysis (TDA) to summarize the high dimensionality and noisiness of gene expression data using lens functions that delineate plant tissue and stress responses. Using this framework, we created a topological representation of the shape of gene expression across plant evolution, development, and environment for the phylogenetically diverse flowering plants. The TDA-based Mapper graphs form a well-defined gradient of tissues from leaves to seeds, or from healthy to stressed samples, depending on the lens function. This suggests that there are distinct and conserved expression patterns across angiosperms that delineate different tissue types or responses to biotic and abiotic stresses. Genes that correlate with the tissue lens function are enriched in central processes such as photosynthetic, growth and development, housekeeping, or stress responses. Together, our results highlight the power of TDA for analyzing complex biological data and reveal a core expression backbone that defines plant form and function.more » « less
An official website of the United States government

Full Text Available